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The Causal Effects of Grade Retention on Behavioral Outcomes 


Abstract 

This study examines the impact of grade retention on behavioral outcomes under a 
comprehensive assessment-based student promotion policy in New York City. To isolate the 
causal effect of grade retention, we implement a fuzzy regression discontinuity (RD) design that 
exploits that grade retention is largely determined by whether a student scores below a cutoff on 
a standardized test score. We use data on students subject to the policy over a nine year span to 
examine impacts on attendance and disciplinary event outcomes. We do not find evidence of 
systematic effects of retention on behavioral outcomes in either direction. We do find sporadic 
non-sustained significant effects of retention on behavioral outcomes. When present, these 
isolated non-persistent effects tend to be beneficial when found for retained elementary school 


students and mixed for retained middle school students. 
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The Causal Effects of Grade Retention on Behavioral Outcomes 

The widespread availability of standardized assessment scores resulting from the 
adoption of test-based accountability policies, such as the No Child Left Behind Act of 2001, has 
made it possible to base grade promotion decisions in large part on standardized test 
performance. As of 2009, Marsh, Gershwin, Kirby, and Xia (2009) reported six states and 13 
large school districts having implemented assessment-based student promotion policies. More 
recently, Workman (2014) reported that 16 states had a standards-based promotion policy in 
place for third grade reading alone as of December, 2014. 

The theory behind test-based promotion policies is that providing an additional year of 
instruction in the current grade affords struggling students the opportunity to improve their 
proficiency in the curriculum of the current grade, which in turn enables students to fully engage 
in the curriculum of subsequent grades. Given that it extends a student’s education timeline by a 
full year, grade retention is an intensive remedial intervention. One estimate suggests the total 
cost of retaining students in the United States exceeds $12 billion annually (West, 2012). Grade 
retention is also very controversial for reasons that go beyond its monetary costs. Critics argue 
that grade retention is punitive in nature, and that being a year behind one’s peers may result in 
disengagement with school (Jackson, 1975; Roderick, 1994). 

Such disengagement could manifest itself through an increase in behavioral problems 
problems (Byrd, Weitzman, & Auinger, 1997). Developmental psychologists have shown that 
stressful life events are correlated with behavioral problems (Compas, 1987), and have also 
found that students see grade retention as one of the most stressful life events that can befall 
them (Anderson, Jimerson, & Whipple, 2005). Indeed, a large literature summarized in the meta- 


analyses of Holmes (1989) and Jimerson (2001) has documented a strong association between 
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grade retention and worse academic, socioemotional, and behavioral outcomes.! Partly in 
response to these results on behavioral outcomes and related findings from developmental 
psychology, professional organizations such as the American Federation of Teachers (1997) and 
the National Council of Teachers of English (2015) have come out against grade retention 
policies. Such behavioral problems resulting from grade retention would be worrisome given 
recent evidence linking behavioral problems in school to significantly lower wages and earnings 
in adulthood (Segal, 2013). 

On the other hand, it is also possible that grade retention could reduce behavioral 
outcomes. For instance, if retained students compare themselves to their younger peers during 
the repeated year, their academic self-concept may improve.” Along the same lines, if grade 
retention delivers the academic benefits proponents claim that it does, behavioral problems may 
improve as a result of being held back. 

Motivated by this theoretical ambiguity, this study examines whether grade retention 
affects behavioral problems using administrative data on students subject to a comprehensive 
assessment-based student promotion policy instituted by New York City Department of 
Education (NYCDOE).° Specifically, we test the hypothesis that grade retention affects two 
measures of behavioral problems: absenteeism and suspensions. Absenteeism is a useful measure 


for this study because it is reflective of student engagement in school (Rohrman, 1993). 


' For the prior literature aggregated effect sizes, a negative effect size is an effect in the favor of the promoted 
students. Jimerson (2001) reports an aggregated negative meta-analytic effect size of retention on behavior (non- 
attendance) of -0.11 across 30 outcomes in 11 studies and an aggregated effect-size of -0.65 on attendance outcomes 
across two studies. Holmes (1989) reports a comparable effect size of -0.13 on behavioral (non-attendance) 
outcomes, but a much smaller yet still disadvantageous effect size of -0.18 on attendance outcomes. 

? A theoretical rationale for this perspective comes from social comparison theory (Festinger, 1954; Wu, West, & 
Hughes, 2010), which posits that students draw conclusions about their ability from environmental cues, such as 
comparisons to peers. See Martin (2011) for a discussion of other theoretical frameworks with relevance for the link 
between grade retention and student outcomes, including absenteeism. 

3 Previous studies have examined impacts of this policy on assessment outcomes (Mariano et al., 2009; Mariano & 
Martorell, 2013). 
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Suspensions, on the other hand, are a direct measure of a disciplinary response to behavioral 
infractions, and prior work has noted a correlation between grade retention and suspensions 
(Raffaele, 1999). Examining suspensions is also interesting given recent concerns about high 
rates of suspension and racial disparities in suspension rates (Losen & Martinez, 2013). 

Estimating the effect of grade retention is difficult because students that are retained 
likely would have worse outcomes than promoted students because they are retained precisely 
due to their difficulties in school. As documented in more recent literature, associations reported 
in the meta-analyses of Holmes (1989) and Jimerson (2001) may not reflect causal effects 
(Alexander, Entwisle, & Dauber, 2003; Allen, Chen, Willson, & Hughes, 2009; Hong & 
Raudenbush, 2005; Lorence et al., 2002; Mariano & Martorell, 2013). We address this concern 
by using an empirical strategy that exploits the fact that grade retention is triggered largely by 
whether a student scores below a threshold on a standardized test score. This allows us to use a 
fuzzy regression discontinuity (FRD) research design (Hahn, Todd, & Van der Klaauw, 2001; 
Imbens & Lemieux, 2008) that centers on comparisons of students who score just above or just 
below this cutoff. Under plausible and empirically supported assumptions (which we describe in 
detail below), this approach will generate valid estimates of the causal impact of grade retention 
for students scoring close to the cutoff score whose grade retention status is determined by 
falling above or below the cutoff score. 

This study builds off of a number of recent studies that use a similar research design to 
look at the impact of grade retention on student outcomes. This research has generated mixed 
results, with some studies (Jacob & Lefgren, 2004; Roderick & Nagaoka, 2005) finding small 
and short-lived benefits of grade retention, and others finding larger and more persistent positive 


effects (Greene & Winters, 2007; Mariano, Kirby, & Crego, 2009; Mariano & Martorell, 2013; 
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Schwerdt, West, & Winters, 2017). Other studies examining the impact of grade retention on 
school completion have found that grade retention, particularly in later grades, increases dropout 
(Jacob & Lefgren, 2009; Manacorda, 2012). 

However, we are only aware of one other study that attempts to estimate the causal effect 
of grade retention on behavioral outcomes. Ozek (2015) uses a fuzzy RD design to examine the 
impact of retention under Florida’s third grade retention policy. He finds that retention generates 
a small increase in the likelihood of disciplinary incidents in the first two years post-retention, 
followed by a small decrease in such incidents in the third year. Our study builds on this analysis 
in two ways. First, we examine the effects of grade retention in a setting where the “marginal” 
retained student is at a lower point in the achievement distribution since a smaller proportion of 
students in New York City meet the criteria for assessment-based retention decisions than in 
Florida. Thus, our results provide new information about the effect of grade retention for 
students in need of the greatest academic assistance, which is a population of considerable 
interest. 

A second contribution of the study is that, the NYCDOE policy covered grades 3-8, 
allowing us to examine how effects vary by grade. These analyses are motivated by 
developmental theories which posit that stressful life events such as grade retention may have 
different effects depending on a child’s developmental stage (Anderson et al., 2005; Compas, 
1987). In particular, the socially disruptive, stigmatizing, and academic effects all could vary by 
age, and hence by grade. Examining effects by a single grade rather than, for instance, grouping 
elementary and middle school grades, allows us to assess impacts in transition grades to middle 
or high school. The findings from earlier studies also provide further rationale for examining 


effects by grade, with more negative effects on outcomes such as high school completion found 
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for later grades than earlier grades (Jacob & Lefgren, 2009; Manacorda, 2012; Schwerdt et al., 
2017). This turns out to be important since we find some evidence that the effects of retention 
vary across grades. 

Our analyses reveal no persistent systematic effects of student grade retention on 
disciplinary incidents and attendance. In the grade-specific analyses, we do find sporadic non- 
sustained significant effects of retention on behavioral outcomes. When present, these isolated 
non-persistent effects tend to be beneficial when found for retained elementary school students 
and mixed for retained middle school students. 

Below we examine the broader NYCDOE promotion policy under study in more detail, 
including both retention and other treatments, and describe the available data. Next we describe 
our fuzzy regression discontinuity approach to estimating policy effects on behavioral outcomes. 
We then review our results and present a concluding discussion. 

New York City’s Student Promotion Policy 

In the 2003-2004 school year, NYCDOE implemented an ambitious reform initiative that 
included a new assessment-based promotion policy for general education students in grade 3 
(Mariano & Martorell, 2013; McCombs, Kirby, Marsh, & DiMartino, 2009). This policy was 
extended to grade 5 in the fall of 2004, to grade 7 in 2006-2007, to grade 8 in 2008-2009, and 
finally to grades 4 and 6 in 2009-2010, and it remained in place through the 2012—2013 school 
year. Students in charter schools, special education students and early English language learners 
were exempt from the policy. 

This policy mandated reliance on standardized test scores for grade promotion decisions. 
Students who scored in the lowest performance level (Level 1)* on either the math or English 
Language Arts (ELA) annual spring assessments were at risk of being retained in grade under the 


4 There are four levels on the New York assessment reporting scale. Level 3 indicates proficiency. 
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policy. However, students who scored Level | in the spring had multiple additional opportunities 
to demonstrate eligibility for promotion (Mariano & Martorell, 2013; McCombs, Naftel, 
Ikemoto, DiMartino, & Gershwin, 2009), starting with a portfolio review at the end of the 
regular school year in June. Those who did not demonstrate Level 2 on the June portfolio were 
required to attend the City's summer instructional program that consisted of approximately 
twenty 4.5 hour instructional days, including at least 1.5 hours in each of ELA and math per day 
(see Ikemoto, McCombs, DiMartino, and Naftel, (2009), for a full description of summer school 
implementation). 

At the conclusion of the summer program, these students took a City summer assessment 
in the subjects in which they had scored at Level | in the spring. Students still scoring Level 1 on 
the summer assessment were eligible for retention. The summer assessment program was a 
continuation of NYCDOE’s prior spring testing program that predated NCLB state testing 
requirements. In 2001, when the State of New York spring assessments covered only fourth and 
eighth grades, NYCDOE contracted with CTB McGraw Hill to create criterion referenced spring 
assessments for third, fifth, sixth, and seventh grades aligned to the state curriculum, such that 
the NYCDOE had a contiguous spring testing program for grades three through eight.>° When 
the promotion policy started in 2004, an alternate form of the spring NYCDOE assessment was 
used as the end of summer assessment. When the State began testing all grades three through 
eight in 2006, NYCDOE continued their curriculum aligned assessment program for use as the 


end of summer assessment.’ 


5 Harcourt Assessment took over the annual ELA assessment development in 2003. By 2010, both tests were 
eventually contracted to Pearson. 

® A standard setting exercise to develop cuts for level scores relative to curriculum standards was conducted in 2001; 
in future years, cut scores were determined through the equating of scale scores within grade across years. A new 
standard setting exercise was conducted in 2010, which updated the cut scores and the reporting scale. 

7 When the policy was extended to grades four and eight, aligned versions of the City assessments were added. 
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Students scoring Level 2 on the summer assessment had the opportunity for a second 
portfolio review in August. Those not demonstrating Level 2 or better performance by the end of 
this process were retained in grade unless the student's principal and community superintendent 
granted an exemption.® Despite multiple opportunities to meet the promotion criteria, below we 
show clear evidence that scoring below the summer assessment Level 2 cutoff on either math or 
ELA sharply increased the likelihood of being retained. 

Grade retention was just one of the treatments students might have experienced under this 
comprehensive promotion policy. The policy complemented the threat of retention by placing 
considerable emphasis on an early intervention structure for struggling students. Kirby, 
McCombs, and Mariano (2009) note that the provision of supplemental services for struggling 
students, along with multiple opportunities to demonstrate Level 2 performance, placed 
NYCDOE’s promotion policy in alignment with tenants for the validity and fairness of test- 
based promotion decisions articulated by the National Research Council (Heubert & Hauser, 
1999). 

At the beginning of each school year, schools identified students who were potentially at 
risk of retention. This group included students retained the prior year and students scoring at 
Level 1 or in the lower half of Level 2 on a prior year spring assessment. These students received 
a variety of additional instructional services both within and outside the classroom during the 
school day, as well as additional services outside of regular school hours (Ikemoto, et al., 2009; 
McCombs, et al., 2009). As detailed in the Methods section below, our research design focuses 
on comparisons of students with summer assessment scores at the boundary between Level | and 


Level 2. All students contributing to the identification of the effects of retention, whether 


8 Eighth grade students also had to pass core courses in ELA, math, science, and social studies to be promoted. In 
practice, it was highly uncommon for eighth grade students to pass a core course while scoring Level | on the spring 
assessment of the same subject. 
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promoted or retained, had a Level | spring assessment score in math, ELA, or both and were 
eligible for the same supplemental services.’ The provision of the services, however, is an 
important contextual factor in evaluating the impact of retention under this policy. The effects of 
retention under this policy discussed below must be interpreted in that context; in the absence of 
the broader policy supports, the effects of retention may differ. 
Data 
Data Sources 

NYCDOE has provided relevant administrative data for all cohorts of students from 
2003-2004 through 2011-2012 in all grades subject to the promotion policy. Each cohort 
contains approximately 54,000 to 63,000 general education students subject to the promotion 
policy. Available baseline data include student characteristics, including, but not limited to, 
English language learner status, free or reduced lunch status, gender, and race/ethnicity; baseline 
and prior years’ spring and summer ELA and mathematics assessment scores; baseline and prior 
years’ attendance and suspension information; summer attendance record; school attended; and 
portfolio data for select cohorts. Behavioral outcome data include attendance and suspension 
data for each year post baseline through 2013-2014. The available suspension data include 
information on both the length of the suspension and the severity of an incident. 

The summer assessment results are critical measures that trigger retention eligibility. As 
detailed by Mariano and Martorell (2013), the spring assessments determine eligibility for a 
separate treatment, summer instruction. The summer assessment determines retention, unless 
avoided through summer portfolio or other exemption. There were a number of changes to the 
scales used for the summer assessments during the study period, necessitating rescaling to a 


° Promoted students would have received these services in the next grade; services would have been tailored to their 
needs. Promoted eighth grade students would not have received the supplemental services in high school. 
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common metric to enable cross-cohort analyses. Because the summer assessment is only 
administered to a non-random subset of the reference population, attempting to use the empirical 
standard deviation of the scores in a z-score transformation to place them on a common scale 
would overinflate the spacing of the scores. More importantly, with the testing drift present in 
the state spring test scores between 2007 and 2009, necessitating recalibration of the state spring 
test in 2010 (New York State Board of Regents, 2010), the subset of the population taking the 
assessments in those years is not expected to match, causing resultant z-scores to not be 
exchangeable over that time. As an alternative, we rescale the scores by dividing by the width of 
Level 2 on the reporting scale for each grade and year and centering so that zero represents the 
treatment cut point; we reference this as “Level 2 scaling” below. Dividing by the Level 2 width 
is a desirable transformation because it retains the relative distance between scale scores; the 
assessment designers used the width of Level 2 to equate scores across years (Pearson 
Psychometric and Research Services, 2011).!° To accommodate both ELA and math summer 
assessment scores determining eligibility, we use the minimum of the two Level 2 scaled scores 
as the running variable (i.e., the variable determining treatment assignment eligibility) in the RD 
analyses, as further discussed below. 
Outcomes 

In this study we examine impacts of grade retention on behavioral problems. Specifically, 
we estimate effects of retention on measures of suspensions and school attendance. For 
suspensions, we consider whether a student has any suspensions as well as the number of days 


suspended. We also consider these measures separately for suspensions for less- and more-severe 


‘0 Note that in analyses that pool across cohorts and grades, the scaling we use could generate slight differences in 
the set of students close to the cutoff than an alternative scaling (e.g., z-score transformation). However, the results 
are similar in models that do and do not include grade-by-cohort effects, suggesting that this issue is unlikely to 
appreciably affect the results. 
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disciplinary infractions. Infractions are classified by NYCDOE into 5 categories. We define a 
suspension as a “less severe” suspension if it was for an infraction deemed as one of the 3 lower- 
level category infractions (“uncooperative/noncompliant behavior’, “disorderly behavior”, or 
“disruptive behavior’) and a “more severe” suspension if it was for one of the two highest 
infraction levels (“aggressive or injurious/harmful behavior”, “seriously dangerous or violent 
behavior’). For attendance, we examined effects on the attendance rate (defined as the number of 
days a student was in attendance divided by the number of days in the year in which a student 
was enrolled) as well as an indicator for “chronic absence”, which the NYCDOE defines as 
having an attendance rate of less than 90 percent. The behavioral outcome measures are available 
for each year post baseline through 2013-2014. 

An important issue for an analysis of the impacts of grade retention is the timing of when 
outcomes are measured. As we are examining non-academic outcomes, we follow the example in 
several recent studies on the impacts of grade retention (Greene & Winters, 2007; Roderick & 
Nagaoka, 2005; Schwerdt et al., 2017) and conduct “same-age comparisons.” That is, we 
measure outcomes for promoted and retained students at the same point in time following the 
summer assessment when they are in different grades. 

Appendix Table Al shows means for the outcomes by grade. Examining these patterns is 
important because it helps to shed light on whether suspension and attendance are meaningful for 
younger students. Both suspensions and attendance vary strongly by grade, with higher 
attendance and lower suspension rates for younger students. This variance across grades in part 
motivates our analysis of the retention effects by grade. Despite the low suspension rates in 
elementary school (especially for third graders), suspensions certainly occur in every grade, and 


therefore we think suspension is a useful indicator of behavioral problems even for the youngest 
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students in our sample. Similarly, even for younger students, the attendance rate is well below 
100 percent, and the incidence of chronic absenteeism is over 30 percent for all grades. 
Analytic Sample 

Our analysis focuses on students who were subject to New York City’s promotion policy 
and who took the summer assessment in the period 2004—2011 after receiving a Level 1 spring 
score. We further restrict the sample to those who have valid scores for at least one subject since 
our research design uses summer assessment scores as the running variable in the FRD 
estimation. There are 92,425 students meeting these criteria. 

For all analyses, we further limit the sample to students who are enrolled in a New York 
City public school in the fall following the summer assessment and for whom we observe the 
grade in that year. This restriction applies to about 3 percent of the students taking the summer 
assessment and is necessary so that we can determine which students were retained. The final 
size of the analytic sample is also determined by the bandwidth chosen to fit the FRD model. For 
example, using a bandwidth of | results in an analytic sample of 76,167 students across all 
grades and cohorts subject to the policy. For analyses of outcomes two and three years following 
the summer assessment, we also limit the sample to students who were enrolled in those years.'! 
Also note that for all analyses, we pool across all available cohorts subject to the promotion 
policy in the relevant grade. 

Summary Statistics 

Table 1 shows descriptive statistics for our analysis sample. Sixty-six percent of students 

who take the summer assessment are promoted. In the year prior to the summer assessment, 


about 8 percent of students are suspended at least once and on average spend 1.2 days suspended. 


'! One potential problem with restricting the sample to students who were enrolled in future years is that it may 
impart a sample selection bias if future enrollment in a New York City school is affected by the outcome of the 
summer assessment. We discuss this issue below. 
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The daily attendance rate is 90 percent, which is low considering the NYCDOE’s definition of 
“chronic absence” is less than 90 percent attendance. Forty-nine percent of the students in our 
sample are black and 39 percent are Hispanic. In contrast, in NYCDOE as a whole in the 2012— 
2013 school year, only 29 percent of students were black, while 40 percent were Hispanic. About 
five percent of students are not enrolled ina NYCDOE school 2 years after the summer 
assessment and 9 percent are not enrolled 3 years after the summer assessment. 

When comparing promoted and retained students, we see some notable differences on 
some characteristics but also a number of similarities. Most notably, promoted students have 
much better math and ELA scores in the spring prior to the summer assessment. Retained 
students are also 4 percentage points more likely to be black and 2 percentage points less likely 
to be Hispanic. There are not large differences in the attendance rates and proportion suspended 
in the year prior to the summer assessment; however, retained students averaged an additional 
0.2 days suspended, 14 percent higher than the promoted students in the analysis sample. 
Attrition 3 years after baseline is nearly identical for retained and promoted students. 

Sample sizes for the first year post-retention are indicated in Table 4. Third grade 
contains the most cohorts in the sample and, by far, the highest sample size at 25,898 students 
across all cohorts. Grade four contains the fewest students in the sample, 4,753, which is not 
surprising given the policy was implemented in grade four six years after the policy started in 
grade three. 

Methods 
Research Design 
Our identification strategy rests on comparisons of students who were barely above and 


barely below the Level 2 cutoff on the summer assessment. While students scoring below Level 
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2 are likely to be different in many ways from those scoring Level 2 or higher, these differences 
are likely to be much smaller among those students scoring close to the Level 2 cutoff in all 
dimensions other than the probability of being retained. Therefore, we use a fuzzy regression 
discontinuity design (Hahn et al., 2001; Imbens & Lemieux, 2008) centering on comparisons 
between students who score just above and below the Level 1 summer assessment cutoff to 
identify the effect of grade retention. 

Specifically, we estimate the following system of equations: 


(1) Y, = WW, + fy (X,)+& 


W, =aT, + fw(X)+V; 


where Y, represents an outcome for student i, W, denotes grade retention status, f,(X,) and 


I 
fw (X;) are control functions of the summer assessment score, X; , and €; and Vv, are residuals. 


The variable Ti is a dummy variable for scoring below the Level 2 cutoff, and serves as the 
instrumental variable (IV) for Wi. The parameter @ represents the effect of grade retention.'” 
While not shown, we also control for baseline variables.'* Doing so is not necessary for 
producing consistent estimates of the impact of retention (provided the fuzzy RD identification 
assumptions listed below hold); however, it does reduce residual variance and therefore 
improves precision. 

The validity of this approach rests on two key conditions. The first is that barely falling 
below the Level 2 cutoff affects the probability of retention (i.e., 7 #0), which we show below 


clearly holds in this context. The second is that barely falling above or below the Level | cutoff 


Note that this analytic design makes no assumptions about the mechanism by which grade retention may or may 
not affect students’ behavioral outcomes. 

'3 These covariates include grade-by-cohort dummies, English Language Learner status, race, gender, old for grade 
status, any days suspended, days suspended, attendance rate, missing attendance rate, spring math z-score, spring 
ELA z-score, missing math z-score, and missing ELA z-score. 
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only affects Yi by changing the probability of being retained (i.e., thatT, and €, are 


uncorrelated). This condition requires that whether a student falls above or below the Level 2 
cutoff is “as good as” randomly assigned in a narrow region around the cutoff score (Lee, 2008). 
We discuss evidence consistent with this assumption below. This second condition also requires 
that the only mechanism through which falling below the Level 2 cutoff affects student outcomes 
is by changing the likelihood of being retained. This seems plausible; the only purpose of the 
summer assessment is to determine whether students in the summer school program have 
mastered enough material to be promoted to the next grade. One important threat to validity is 
that there is differential attrition from the sample following the summer assessment. We present 
evidence below on this issue when discussing the validity of the research design. 

When these assumptions hold, our research design will capture the “local” average effect 
for students whose academic ability makes them likely to score close to the Level 2 cutoff and 
whose retention status is affected by scoring below or above the Level 2 cutoff (Imbens & 
Angrist, 1994; Lee, 2008).'* Our estimates will therefore not necessarily be applicable to 
students with scores very far away from the Level 2 cutoff nor for students who avoid being 
retained even when scoring below Level 2. Nonetheless, they will still be informative about the 
effects for a policy relevant subgroup. This is because the students scoring near the Level 2 
cutoff are those deemed by NYCDOE to be just at the margin for needing and benefiting from 
retention. Similarly, a central feature of the policy was that retention be based in part on the 
outcome of standardized tests, and our estimates are applicable for students whose grade 
retention status is determined by test score performance. 

'4 & further issue of interpretation concerns the pooling of students across grades and years since the Level 1/Level 2 
cutoff varies by grade and year. As shown by Cattaneo et al. (2016), the pooled estimates in this context can be 


interpreted as a “double average,” or the weighted average across cutoffs of the local average treatment effects for 
students facing a given cutoff value. 
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Estimation 


As in any RD analysis, misspecification of f,(X,) and f,,(X,) is a serious concern as it 


could lead one to incorrectly conclude that there are (or are not) discontinuities at the Level 2 
cutoff. We report results from both parametric and nonparametric methods of implementing the 
fuzzy RD design. The nonparametric estimates are obtained using the “local polynomial” 
regression estimator proposed by Calonico, Cattaneo, and Titiunik (2014) (“CCT local linear” 
method below). IV estimates are obtained as the ratio of the local linear estimates of the reduced- 
form and first-stage effects. Their approach uses a “data driven” procedure to obtain the 
bandwidth of scores around the Level | cutoff to use in the estimation, and local linear 
regression to estimate the treatment effects.!> They show that their method generates more 
accurate confidence intervals than other nonparametric estimators that have been used in studies 
using regression discontinuity. The parametric model produces IV estimates of @ via 2-stage 
least squares (2SLS) regression, where the functions fy and fw are modeled as a second degree 
polynomial (“2™ order polynomial” below).!° To show robustness to bandwidth choice, we 
report results for the parametric models with two different bandwidths: 1.0 and 0.5. These 
bandwidths were chosen since most of the selected CCT bandwidths are between 0.5 and 1.0. 
Although the summer assessment Level | data from most grades and cohorts extend at least four 
times the width of the Level 2 scores (i.e., extends to at least a bandwidth of 4.0), about 75 
percent of the summer assessment data fall within a bandwidth of 1.0. 


'S We actually use a simple extension of the method proposed by CCT to account for baseline covariates. 
Specifically, we use the residuals from a linear regression of a given outcome on the complete set of baseline 
covariates as the dependent variable in the CCT local linear procedure. Provided that the baseline covariates are 
uncorrelated with T; close to the Level 1 cutoff (which is true in a valid RD design), the point estimates from this 
modified procedure should reconcile with estimates obtained if we just used the unadjusted outcomes as the 
dependent variable. 

‘6 We use linear models despite the likely nonlinearity of the true conditional expectation functions that we are 
attempting to estimate. This is because our goal is to estimate local average treatment effects at the Level 1/Level 2 
cutoff, which can be done successfully using linear IV without having to make further parametric modeling 
assumptions (Angrist and Pischke, 2009). 
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An important issue for implementing this approach in this setting is that the summer 
assessment consists of two scores, whereas the framework developed above assumed a one- 
dimensional running variable. We account for dual assignment scores by using the minimum of 
the math and ELA scores as the running variable (where each subject score has been rescaled to 
be zero at the passing cutoff, after scaling by the width of the Level 2 score range, as discussed 
above). Since students are assigned to retention if they score below the Level 2 cutoff on either 
math or ELA, whether a student is assigned to retention is determined by whether or not this 
minimum score variable is negative.!’ 

In addition to the results reported, we further explored the robustness of the results by 
fitting several additional model specifications. To examine the sensitivity of function form and 
bandwidth choices, we also fit several additional specifications, including first through third 
degree polynomial estimates at bandwidths of 0.5 and 1.0, as well as global polynomial estimates 
(exclusive of the lowest observable scale score). To explore the sensitivity of results to pooling 
estimates, both across subject areas and school years, we also considered model specifications 
that fit the estimates using each subject separately, as well as specifications that consider 
estimates both before and after 2010, when the NYCDOE summer assessment was rescaled. The 
results of these additional robustness checks are consistent with the general conclusions reported 
below are available from the authors upon request. 

A final consideration for the estimation is how we account for the fact that we examine 


numerous outcomes separately by grade. To account for the simultaneous estimation of a series 


'7 Wong et al. (2013) refer to this method as the “centering approach” and Reardon and Robinson (2012) refer to it 
as the “binding-score RD”. For sensitivity, we also ran each of the models below using a single-assessment running 
variable; conclusions for both individual assessments as the running variable were similar to those using the 
minimum score. 
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of treatment effects, we implement a False Discovery Rate adjustment (Benjamini & Hochberg, 
1995) at a rate of 0.05. Given that the outcomes examined are strongly correlated, for example, 
incidence of suspension and days suspended, the False Discovery Rate adjustment based upon 
the total number of tests is likely too conservative in compensating for the multiple testing. 
Therefore, in the tables below, we report statistical significance with and without the adjustment 
for multiple hypotheses testing. We discuss results based upon the unadjusted estimates. 
RD Validity 

As noted above, our approach requires that scoring below the Level 2 cutoff require 
quasi-random assignment local to the Level 2 cutoff. This assumption is plausible in this context. 
Neither the teachers nor students know the number of correct answers needed to meet the Level 2 
standard.'® Even with no test score manipulation, restricting the sample to students who 
remained ina NYCDOE school in the years following the summer assessment could undermine 
the research design if falling below the Level 2 cutoff affects the composition of students 
remaining in the sample. For instance, if students assigned to retention are more likely to leave 
the sample, the estimated impacts of retention could be upward biased if the students induced to 
leave were ones who would tend to have higher rates of behavioral problems than those who 
remained. On the other hand, if students induced to leave were the ones with more motivated 
parents and this was associated with lower rates of behavioral problems, there would be 
downward bias.'? We investigate sample attrition in Table 2 by modeling the probability of 
enrollment using a strict RD model with the same minimum scale score running variable used in 


our primary analyses. Scoring just below Level 2 reduces the probability or being enrolled in 


'8 Multiple choice answer sheets were scanned locally at the summer school test site and automatically uploaded to 
the DOE’s administrative database, where a scoring key and raw score to scale score conversion were applied. 

') The inclusion of baseline control variables in the FRD model should help mitigate potential bias resulting from 
restricting the sample to students who remained in a NYCDOE school following the summer assessment. 
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Year 1 (the academic year following the summer assessment) by about | percentage point.”° 


Although these estimates are statistically significant, the magnitude of the differential attrition is 
small and the overall Year | attrition rate is very low; almost 97 percent of students are observed 


t.?! For Years 2 and 3, the attrition rates are 


enrolled in the year after the summer assessmen 
much higher but there is no evidence of differential attrition at the Level 2 cutoff. 

We also conduct standard investigations of the validity of the RD design. Since these 
analyses were done on the main analytic sample — students who enrolled in Year 1 — they would 
capture any violations in the research design due to either differential attrition or from 
manipulation of the summer assessment scores at the Level 2 cutoff. First, we examine whether 
students are equally likely to score above or below the Level 2 cutoff by implementing the 
procedure proposed in McCrary (2008) to test for the presence of a discontinuity in the density of 
test scores. The estimated discontinuity in the log of the density is .022 with a standard error of 
.031 (p=0.483),”” and as shown in Figure 1, there is no visual indication that the distribution of 
test scores is discontinuous at the Level 2 cutoff. We replicated this test within each grade and 
for each subject individually across and within grade. No significant indications of a 
discontinuity were discovered. Second, we tested whether there were sharp differences in 
baseline covariates at the Level 2 cutoff. Table 3 shows estimated discontinuities in baseline 
characteristics at the Level 2 cutoff. None of these estimates were found to be statistically 


significant for the CCT local linear or 2™ order polynomial specifications. Note that this analysis 


was conducted for the sample of students that remained in the data through year 1, so if 


° Throughout this article, we refer to the Xth year after the summer assessment as “Year X”. 

>! Note that for this attrition analysis, we include all students who took the summer assessment (and who met other 
inclusion criteria), including those who attrit before Year 1. This allows us to test for differential attrition in Year 1. 
In contrast, in our primary analyses, we restrict the sample to those who enroll in Year | so that we can determine 
whether they were retained (which is necessary to conduct the fuzzy regression discontinuity analysis). 

2 As suggested in McCrary (2008) we compute the basic bandwidth and use half that bandwidth to reduce bias in 
estimating the discontinuity. 
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differential attrition were generating differences in observable characteristics at the Level 
1/Level 2 cutoff, it would have been detectable in this analysis. Overall we interpret these results 
as supporting the assumptions underlying the research design, despite some differential attrition 
in Year 1. 
Results 
First Stage Estimates 

Table 4 displays first-stage estimates of the effect of scoring Level 1 on the summer 
assessment on the probability of being retained for each policy grade and also pooled across all 
grades. The pooled estimates demonstrate a strong increase in the probability of retention for 
Level | students. Level 1 students around the Level 2 cut-point have a probability of retention 
approximately 0.65 higher than their Level 2 counterparts at the cut-point, and the F-statistics for 
the hypothesis that the discontinuity is zero are all greater than 400, well above the thresholds 
commonly-used to determine whether there are “weak instruments” (Staiger & Stock, 1997; 
Stock & Yogo, 2005). These pooled first stage results are displayed graphically in Figure 2. As 
seen in the figure, the probability of retention for the Level 2 students at the cut point is 
essentially zero, with only the Level | students being retained, consistent with expectations 
under the policy. Examining the probability of retention results by grade in Table 4, the first 
stage effect for Level 1 eighth grade students at the Level 2 cut point is lower than the pooled 
average, at approximately 0.50 (note the grade-specific estimates are produced by separate 
regressions by grade). All other grades have a first stage effect of at least 0.63 or higher, with 
fourth graders showing a first stage effect of 0.74. These results evidence implementation of the 
retention portion of the policy and support the FRD design detailed above. 


Effects of Retention on Suspensions 
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Table 5 shows estimates of the impact of retention on suspension outcomes pooled over 
all grades, and Tables 6a and 6b break these estimates out by grade (Appendix Table 1 shows 
means of the dependent variables by grade). Incidence of suspension and the total days 
suspended are both considered, up to three years removed from the year of retention. The pooled 
estimates shown in Table 4 show no significant effects of retention on suspension incidents or 
cumulative suspension days. This is corroborated by the visual evidence in Figure 3 which shows 
that both the incidence of suspension and number of days suspended do not change 
discontinuously at the Level 1 cutoff score. 

However, several individual grade-specific results are present. The Year | estimates for 
fifth graders (Table 6a) suggest retention may reduce suspensions, although the estimates for 
both the incidence of suspension and number of days are not always statistically significant 
across specifications. However, these effects are not statistically significant for Years 2 or 3, and 
we find no robust evidence of retention effects for third or fourth graders. For older grades 
(Table 6b), retention generates an increase of about 2-3 days suspended in Year 2 for seventh 
graders. However, there are no other effects found for seventh graders. Similarly, we do no find 
consistent evidence across specifications of effects for sixth or eighth graders. 

Taken together these results suggest that a systematic or persistent effect of retention on 
suspension outcomes is not present in the aggregate or at any individual grade. We do find some 
instances where the estimates indicate that retention in a particular grade increases or decreases 
suspensions, but these effects never extend beyond a single year. 

Effects of Retention on Attendance 
Table 5 displays estimates of the effects of retention on attendance pooled over all grades 


and Tables 7a and 7b considers these effects by grade. When pooling across grades, we do not 
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find any evidence that retention affects the incidence of chronic absence. This is consistent with 
the graphical evidence in Figure 4. For the attendance rate, the Year 1 effect is small and 
statistically insignificant. The Year 2 estimates suggest retention increases the attendance rate by 
about | percentage point. The Year 3 estimates are of similar magnitude but are less precise and 
statistically significant only for the parametric model with a bandwidth of 1. The graphical 
evidence in Figure 4 is not definitive, but offers some indication that the attendance rate in Years 
2 and 3 is a little higher for students barely falling below the Level 2 cutoff than for those 
scoring barely above it. 

Turning to the effects by grade, we do not find any evidence of effects on the attendance 
rate or chronic absenteeism for grades 4 and 6. In contrast, the estimates suggest that grade 
retention reduces chronic absenteeism for third graders by about 5 percentage points in Years | 
and 3 (the estimated effect for Year 2 is smaller and statistically insignificant).*? At the same 
time, we find little evidence that retention in grade 3 increases the overall attendance rate; only 
the Year 3 estimate from the 2"! order polynomial specification with a bandwidth of 1 is 
statistically significant. For fifth graders, retention increases the Year | attendance rate by 1 
percentage point, but the effects for Year 2 and Year 3 were small and statistically insignificant. 
The estimated effects on chronic absenteeism were all statistically insignificant. 

The estimates for seventh graders indicate retention improves the attendance rate in Year 
2 by 4-5 percentage points. The estimates for Year 3 are similar in magnitude but not statistically 
significant for the bandwidth of 0.5 specification. The Year 1 estimates are modest in magnitude 
(about 1.5 percentage points) and statistically significant only for the parametric model with a 
bandwidth of 1. The point estimates all suggest that retention may have lowered the incidence of 
chronic absence, but none are statistically significant. 


>3 Estimates beyond year three are also essentially nil. 
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For eighth grade, we find that retention increases the likelihood of Year 1 chronic 
absenteeism by 12-15 percentage points. This worsening of chronic attendance dissipates beyond 
Year 1; the effects are not statistically significant for Years 2 and 3. While the chronic 
attendance point estimates drop from Year | to Year 2, they remain sizable in Year 2 before 
dropping further in Year 3. We also find negative effects of retention on the Year 1 attendance 
rate, although this is only statistically significant for the CCT estimator. For Year 2 and 3, 
however, we find non-significant estimates of retention on the overall attendance rate that are 
much smaller and sometimes positive. Thus any effect after Year 1 on chronic absenteeism 
appears to be driven by small changes in the attendance rate of students near the threshold for 
chronic absenteeism rather than a wholesale change in the attendance patterns following 
retention. 

To summarize, the significant effects on increased attendance rates in the pooled grade 
estimates in the second year post-retention appear to be driven primarily by effects among 
students retained in seventh grade, as the Year 2 and 3 effects on attendance rates for third 
through sixth graders are either small or negative. The results for seventh graders and the pooled 
results also stand in contrast to the results for eighth graders, where we find evidence that grade 
retention leads to a temporary but sizable increase in chronic absenteeism. 

Discussion 

The results discussed above do not provide evidence of systematic, persistent effects of 
retention on attendance or suspension outcomes. These findings are in the context of the broader 
promotion policy instituted by NYCDOE over the cohort years examined, which offered 
multiple supplemental services in the proximal and following cohort years to both retained and 


marginally promoted students. When we pool across grades, the estimates tend to be small and 
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statistically insignificant. An exception is that retention appears to have a modest positive effect 
on Year 2 attendance, which is driven mainly by effects for seventh graders. Otherwise, we do 
not find strong evidence of pooled effects of retention on behavioral outcomes. 

In the grade-specific analyses, we find some “one off” cases where retention appears to 
affect suspensions or attendance, but these effects do not persist for multiple years post-retention 
and do not consistently indicate positive or negative effects of retention on behavioral problems. 
For elementary school grades, some of the results suggest that retention reduces suspensions in 
the year that retained students were in fifth grade and their promoted counterparts were in sixth 
grade. However, any such effects do not persist once the retained students would be in sixth 
grade. A similar pattern appears for isolated improvements in attendance and chronic 
absenteeism. For middle school grades, we find some conflicting evidence of the effect of 
retention on attendance, with retention in seventh grade potentially improving attendance while 
retention in eighth grade increasing the incidence of chronic absenteeism in the following year. 
Otherwise, any statistically significant effects we find did not last for more than one year. 

One interesting pattern that emerges from the analysis of grade-specific estimates is that 
most of the instances of statistically significant estimates coincide with the transitions into and 
out of middle school (in New York City, middle schools most commonly cover grades 6 through 
8). Thus, to the extent that retention affects suspensions or attendance in our sample, it may be 
driven by differences in prevalence of suspensions and absenteeism in elementary, middle, and 
high school. Moreover, since suspensions and truancy are quite rare in elementary grades, these 
may be a coarse indicator of disciplinary problems in those grades. However, it is not always 
clear from our results how retention interacts with middle school transitions. For instance, we 


find that retention of seventh graders improves attendance in the year that promoted students 
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would be in ninth grade, but that retention of eighth graders worsens chronic absence in the year 
that promoted students would be in ninth grade. This suggests retention during middle school 
may provoke different short-term reactions depending on the proximity of the retention to 
ascension to high school. 

There are two important caveats to consider with respect to our findings. The effects 
estimated in this analysis apply to students near the treatment cut-point between Level | and 
Level 2 scoring on the summer assessment. In addition, the FRD design produces treatment 
effect estimates for the students whose retention status is determined by scoring Level 1 on the 
summer assessment. The “fuzziness” in the assignment process reflects the use of non-test score 
factors (e.g., portfolios) to make retention decisions, which aligns with recommendations 
concerning assessment-based policies (Heubert & Hauser, 1999). However, an implication for 
our study is that our findings might not be applicable for students whose test scores fell below 
the grade promotion standard but were nonetheless promoted. 

Attendance and suspensions, while being important metrics in understanding the 
unintended consequences of retention, are relatively broad measures that, of course, do not fully 
describe student behavior. This may be particularly true in the elementary grades where truancy 
is less an option of autonomous choice and if suspension is less likely to be utilized in response 
to inappropriate behavior. Research examining additional facets of student behavior, such as 
teacher and peer interactions, classroom citizenship, and non-academic classroom function, as 
well as broader socio-emotional outcomes would help inform a more complete understanding of 
the impact of retention. Le, Mariano, and Crego (2009) found that retained elementary students 
in New York City had comparable self-confidence in reading and math and a higher sense of 


school belonging that their promoted peers, and also emphasized the need to further explore non- 
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academic outcomes of retention. 

Our results on attendance and suspension outcomes stand in sharp contrast to those of an 
earlier literature that finds that grade retention is positively correlated with such behavioral 
problems. We think this divergence is most likely attributable to our use of a research design that 
delivers causal estimates of the impact of retention. These results broaden our understanding of 
the causal impact of retention on future behavioral outcomes in two ways. First, we expand upon 
Ozek’s (2015) prior causal examination of retention based upon third grade ELA performance by 
examining retention decisions for multiple grades, three through eight, and based upon 
proficiency in multiple subjects, ELA and math. Second, our effect estimates correspond to a 
different point on the distribution of proficiency in New York City that that of Ozek’s (2015) 
prior examination of similar outcomes in Florida. The two treatment cut-points are not directly 
comparable; unlike NYCDOE’s summer assessment administered to the spring Level 1 
population, Florida’s spring assessments, administered to all students, were used to determine 
retention assignment. However, sixteen percent of Florida third grade students were eligible for 
summer school, and 8 percent were retained, as opposed to seven percent of third graders 
required to attend summer school and three percent retained in New York City.”* In contrast to 
our results, Ozek (2015) does find that retention generated a short-term increase in behavioral 
problems, followed by a significant decrease in the third post-retention year. The presence of 
these differences is a reminder that retention policies in various jurisdictions each have local 
characteristics. Such characteristics need to be considered in interpreting our and other results 
and highlight the need to continue to expand the examination of the effects of retention on 
behavioral outcomes across broader policy circumstances using causal designs. 


4 Across all grades and cohorts in the years studied, six percent of eligible New York City students were required to 
attend summer school and two and a half percent were retained. 
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Our results demonstrate that retaining students in elementary and middle school does not 
necessarily increase behavioral problems as measured by absences and suspensions. Contrary to 
the prior observational literature, this paper provides an example of a formal test-based 
promotion policy that did not generate systematic negative effects on attendance and suspension, 
at least for students scoring close to the test score threshold used for determining grade 
promotion and in the context of the NYCDOE promotion policy. The instances where we do find 
negative effects of retention for middle schoolers only last for a single year rather than being 
persistent long-term effects. Although impacts on behavioral outcomes are only one 
consideration for evaluating the pros and cons of a retention policy, the experiences of the policy 
in place in New York City over the period examined imply that the consideration of 
implementing such policies do not necessarily start at a deficit with respect to unintended 


behavioral consequences. 
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Table 1: Sample Summary Statistics 


Analysis Sample (bandwidth of 1 and 
conditional on observing grade in 
Ailtesed year following summer assessment) 


students All Promoted Retained 
Outcomes from year prior to summer assessment 
Any suspension 0.082 0.084 0.083 0.084 
Days suspended 1.188 TL: TAs 1.286 
Attendance rate 0.899 0.901 0.904 0.894 
Attendance rate missing 0.000 0.000 0.000 0.000 
Spring ELA z-score _—_- 1.000 -0.993 -0.929 -1.120 
Spring Math z-score —-1.129 -1.104 -1.015 -1.278 
Spring Math missing 0.011 0.008 0.009 0.007 
Spring ELA missing 0.036 0.029 0.031 0.025 
Student Demographics 
Old for grade 0.369 0.367 0.372 0.357 
Ever ELL 0.144 0.148 0.156 0.134 
Male 0.520 0.520 0.517 0.526 
Hispanic 0.387 0.394 0.403 0.376 
Black 0.499 0.494 0.479 0.522 
White 0.030 0.029 0.033 0.022 
Other race 0.084 0.083 0.085 0.081 
Enrolled in future years 
Enrolled 1 yr after 0.968 1.000 1.000 1.000 
Enrolled 2 yrs after 0.927 0.952 0.953 0.949 
Enrolled 3 yrs after 0.891 0.914 0.913 0.914 
Sample size 92425 76167 50413 25754 
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Table 2: Fraction Enrolled in NYCDOE Schools by Year Since Summer Assessment 


Estimated Effect 
Year | <0;012** -0.012** -0.010** 
(0.004) (0.004) (0.003) 
Year 2 -0.003 -0.003 -0.005 
(0.006) (0.006) (0.005) 
Year 3 -0.009 -0.007 -0.010 
(0.006) (0.009) (0.006) 
Bandwidth 0.5 1 Varies 
Specification 2"4 order 2™4 order CCT 
Polynomial Polynomial — Local 
Linear 


Note: Entries are estimated discontinuities in the probability of being enrolled at the Level 1/Level 2 cutoff. The 
estimated standard errors are in parentheses (these are adjusted for clustering at the running variable level in the 
parametric models). Results in the first two columns use a quadratic function of the running variable (the minimum 
of the rescaled math and ELA summer assessment scores), where the function is allowed to have different 
coefficients on either side of the Level 2 cutoff. Results in the final column are based on the local linear estimator 
proposed by Calonico et al. (2014). 

* Significant at a = 0.05; ** Significant at ao = 0.01 
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Table 3: Estimated Discontinuity in Baseline Covariates (Coefficient on "Scoring Level 1") 


Covariate Estimated Discontinuity (with s.e.) 
Any days suspended (w/o 0.010 0.006 -0.008 
2004 and 2005 cohorts) (0.008) (0.006) (0.007) 
Days suspended (w/o 2004 -0.099 -0.136 0.077 
and 2005 cohorts) (0.194) (0.148) (0.151) 
Any days suspended 0.008 0.005 -0.007 
(0.007) (0.005) (0.006) 
Days suspended -0.105 -0.113 0.092 
(0.167) (0.124) (0.121) 
Attendance rate -0.002 -0.002 0.002 
(0.002) (0.002) (0.002) 
Missing attendance rate 0.000 0.000 -0.000 
(0.000) (0.000) (0.000) 
Spring ELA z-score -0.023 -0.003 -0.004 
(0.014) (0.011) (0.013) 
Spring Math z-score 0.006 0.008 0.011 
(0.016) (0.012) (0.013) 
Missing spring math -0.001 -0.001 -0.004 
(0.002) (0.002) (0.004) 
Missing spring ELA 0.002 0.003 0.001 
(0.004) (0.003) (0.002) 
Grade 0.011 -0.026 -0.016 
(0.045) (0.034) (0.048) 
Cohort 0.027 0.008 0.027 
(0.065) (0.049) (0.062) 
Old for Grade -0.013 -0.009 0.013 
(0.012) (0.009) (0.009) 
Ever ELL 0.004 0.001 -0.001 
(0.009) (0.007) (0.008) 
Male -0.009 0.005 0.000 
(0.012) (0.009) (0.010) 
Hispanic 0.000 -0.002 0.001 
(0.012) (0.009) (0.009) 
Black -0.001 0.004 -0.003 
(0.012) (0.009) (0.010) 
White -0.006 -0.005 0.006 
(0.004) (0.003) (0.003) 
Bandwidth 0.5 1 Varies 
nd nd 
Specification ees IE CCT Local Linear 


polynomial polynomial 
Note: Cell entries are estimated discontinuities in baseline variables at the Level 1/Level 2 cutoff, and the estimated 
standard errors are in parentheses (adjusted for clustering at the running variable level in the parametric models). 
Results in the first two columns use a quadratic function of the running variable (the minimum of the rescaled math 
and ELA summer assessment scores), where the function is allowed to have different coefficients on either side of 
the Level 2 cutoff. Results in the final column are based on the local linear regression estimator proposed by 
Calonico et al. (2014). 
* Significant at a = 0.05; ** Significance at a = 0.01 
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Table 4: Estimated Effect of Being Level 1 on Summer Assessment on Probability of 


Retention 
Grade Estimated First-Stage Effect 
All (N=76,167) 0.642**F 0.648**F 0.646**# 
(0.009) (0.006) (0.008) 
3 (N=25,898) 0.676**! 0.686**! 0.682**# 
(0.015) (0.011) (0.012) 
ANS S35) 0.768**4 0.736**t 0.739%*4 
(0.033) (0.023) (0.024) 
5 (N=14,186) 0.621**# 0.632**4 0.639**4 
(0.020) (0.015) (0.016) 
6 (N=10,475) 0.686**# 0.682**# 0.681**# 
(0.021) (0.016) (0.021) 
7 (N=11,684) 0.653**! 0.654**4 0.653**! 
(0.021) (0.016) (0.016) 
8 (N=9,171) 0.489**4 0.501*** 0.501**# 
(0.025) (0.019) (0.021) 
Controls? Y Y Y 
Bandwidth 0.5 1 Varies 
Specification 2™ Order Polynomial 2"4 Order Polynomial pane 


Notes: Sample sizes refer to the sample enrolled in the year after the summer assessment and that fall within a 
bandwidth of one from the Level 1/Level 2 cutoff. Cell entries are estimated discontinuities in the probability of 
being retained at the Level 1/ Level 2 cutoff, and the estimated standard errors are in parentheses (adjusted for 
clustering at the running variable level in the parametric models). Results in the first two columns use a quadratic 
function of the running variable (the minimum of the rescaled math and ELA summer assessment scores), where the 
function is allowed to have different coefficients on either side of the Level 2 cutoff. Results in the final column are 
based on the local linear regression estimator proposed by Calonico et al. (2014). 

* Significant at a = 0.05; ** Significant at a = 0.01; ' Significant after a False Discovery Rate correction, a = 0.05 
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Table 5: Effect of Retention on Suspension and Attendance, Pooled Grades 3-8 


Outcome Estimated Treatment Effect (with s.e.) 
Any Suspension, | yr after -0.010 -0.014 -0.013 
(CCM=0.125) (0.012) (0.009) (0.009) 
Any Suspension, 2 yrs after -0.001 0.004 -0.001 
(CCM=0.109) (0.012) (0.009) (0.009) 
Any Suspension, 3 yrs after 0.017 0.012 0.010 
(CCM=0.114) (0.015) (0.011) (0.011) 
Days suspended, | yr after 0;322 0.221 0.146 
(CCM=1.519) (0.302) (0.238) (0.228) 
Days suspended, 2 yrs. after 0.263 0.402 0.363 
(CCM=1.426) (0.387) (0.282) (0.294) 
Days suspended, 3 yrs. after -0.007 -0.078 -0.136 
(CCM=2.115) (0.477) (0.342) (0.303) 
Attendance rate, | yr after 0.003 0.005 0.003 
(CCM=0.882) (0.004) (0.003) (0.003) 
Attendance rate, 2 yrs after 0.012* 0.010*** 0.010** 
(CCM=0.868) (0.005) (0.004) (0.004) 
Attendance rate, 3 yrs after 0.010 0.011* 0.010 
(CCM=0.846) (0.007) (0.005) (0.005) 
Chronic absence, | yr after -0.010 -0.018 -0.020 
(CCM=0.414) (0.016) (0.012) (0.012) 
Chronic absence, 2 yrs after -0.004 -0.014 -0.012 
(CCM=0.418) (0.017) (0.013) (0.012) 
Chronic absence, 3 yrs after -0.020 -0.017 -0.021 
(CCM=0.461) (0.021) (0.015) (0.015) 
Controls? BG ss Y 
Bandwidth 0.5 1 Varies 
aeurte tien on Order gre Order CCT Local 
Polynomial Polynomial Linear 


Notes: Entries are IV estimates of the effect of retention on a given outcome with standard errors in parentheses 
(adjusted for clustering at the running variable level in the parametric models). Results in the first two columns are 
2SLS estimates from a parametric model that uses a quadratic function of the running variable. Results in the final 
column are IV estimates generated by the local linear regression estimator proposed by Calonico et al. (2014). 
Sample means are “control complier means” (Katz, Kling, & Liebman, 2001) that are calculated as the difference 
between the mean outcome for retained students who score just below the Level 1/2 cutoff (i.e., “treated complier 
mean’) and the IV estimate. 

* Significant at a = 0.05; ** Significant at o = 0.01; f Significant after a False Discovery Rate correction, a = 0.05 
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Table 6a: Effect of Retention on Suspension, by Grade (Grades 3 — 5) 


Geude Years since 
summer test Any Suspensions Days Suspended 
1 -0.001 -0.008 -0.008 0.167 0.003 0.022 
(0.013) (0.009) (0.009) (0.203) (0.136) (0.139) 
3 2 0.016 0.013 0.011 -0.178 0.069 0.001 
(0.015) (0.011) (0.010) (0.210) (0.160) (0.168) 
3 0.002 -0.007 0.001 -0.333 -0.303 -0.281 
(0.022) (0.016) (0.016) (0.477) (0.339) (0.343) 
1 -0.011 0.014 0.000 0.059 0.227 0.120 
(0.030) (0.022) (0.023) (0.242) (0.218) (0.214) 
4 2 -0.090* -0.024 -0.052 -0.730 -0.250 -0.246 
(0.037) (0.028) (0.030) (0.700) (0.533) (0.529) 
) -0.020 -0.041 -0.062 -2.530 -0.900 -2.320 
(0.051) (0.039) (0.040) (1.650) (1.231) (1.366) 
1 -0.039 -0.045* -0.044* -0.142 -0.766* -0.710 
(0.025) (0.019) (0.018) (0.503) (0.380) (0.369) 
5 2 -0.050 -0.005 -0.012 -0.873 -0.377 -0.543 
(0.031) (0.022) (0.019) (1.022) (0.711) (0.695) 
3 -0.008 0.032 0.020 -1.307 -0.612 -1.040 
(0.038) (0.029) (0.029) (1.442) (1.029) (0.995) 
Controls ny ay oY Y Y ny 
Bandwidth 0.5 1 Varies OS 1 Varies 
nota 2™4 order 2™4 order cor 2% order  2™order CCT Local 
Specification : : Local ‘ ; ; 

polynomial polynomial ioe polynomial polynomial Linear 


Notes: Entries are IV estimates of the effect of retention on a given outcome with standard errors in parentheses (adjusted for clustering at the running variable 
level in the parametric models). Results in columns 1, 2, 4 and 5 are 2SLS estimates from a parametric model that uses a quadratic function of the running 
variable, where the function is allowed to have different coefficients on either side of the Level 2 cutoff. Results in columns 3 and 6 are IV estimates generated 
by the local linear regression estimator proposed by Calonico et al. (2014). See Appendix Table | for outcome means by grade. 

* Significant ata =0.05; ** Significant ata=0.01; ' Significant after a False Discovery Rate correction, a = 0.05 
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Table 6b: Effect of Retention on Suspension, by Grade (Grades 6 — 8) 


Grade Years since 
summer test Any Suspensions Days Suspended 

1 0.009 0.003 0.005 0.549 0.336 0.384 
(0.031) (0.024) (0.028) (0.970) (0.813) (0.908) 

6 2 0.030 0.018 0.019 1.097 0.142 0;752 
(0.031) (0.024) (0.027) (1.264) (1.004) (1.116) 

3 0.080* 0.040 0.060 3.845* iResys) 2.437 
(0.037) (0.030) (0.032) (1.531) (1.179) (1.327) 

1 -0.027 -0.015 -0.017 0.321 0.626 0.500 
(0.034) (0.026) (0.027) (0.998) (0.826) (0.701) 
7 2 0.005 0.019 0.024 2.345* SA A525" 
(0.034) (0.026) (0.025) (1.151) (0.865) (0.903) 

3 -0.001 0.012 0.004 -1.448 -0.356 -0.484 
(0.042) (0.031) (0.031) (1.058) (0.784) (0.707) 

1 0.012 -0.021 0.005 0.978 1.479 1.355 
(0.050) (0.038) (0.044) (1.124) (0.953) (0.953) 

g 2 0.025 -0.029 -0.006 -0.296 -0.771 -0.580 
(0.049) (0.037) (0.039) (1.373) (1.054) (1.107) 

3 0.052 0.030 0.049 1.198 0.023 eal 
(0.049) (0.038) (0.044) (1.450) (1.047) (1.349) 

Controls ne Y Y ie Y Ne 
Bandwidth 0.5 1 Varies 0.5 1 Varies 
nena 2™4 order 2 order CCTLocal 2™order  2™ order CCT Local 
Specification : c : : < ; 

polynomial polynomial Linear —_— polynomial polynomial Linear 


Notes: Entries are IV estimates of the effect of retention on a given outcome with standard errors in parentheses (adjusted for clustering at the running variable 
level in the parametric models). Results in columns 1, 2, 4 and 5 are 2SLS estimates from a parametric model that uses a quadratic function of the running 
variable, where the function is allowed to have different coefficients on either side of the Level 2 cutoff. Results in columns 3 and 6 are IV estimates generated 
by the local linear regression estimator proposed by Calonico et al. (2014). See Appendix Table | for outcome means by grade. 

* Significant ata = 0.05; ** Significant ata=0.01; ‘ Significant after a False Discovery Rate correction, a = 0.05 
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Table 7a: Effects of Grade Retention on Attendance, by Grade (Grades 3 — 5) 


Cade Years since 
summer test Attendance Rate Chronic Absences 
1 0.006 0.005 0.005 -0.046 -0.056**# -0.051** 
(0.004) (0.003) (0.003) (0.026) (0.019) (0.019) 
3 2 0.004 0.003 0.002 -0.003 -0.010 -0.002 
(0.004) (0.003) (0.003) (0.028) (0.020) (0.020) 
3) 0.009 0.010* 0.008 -0.070* -0.054* -0.056* 
(0.006) (0.004) (0.004) (0.034) (0.025) (0.025) 
1 -0.000 -0.002 -0.001 0.005 0.029 0.024 
(0.007) (0.006) (0.006) (0.049) (0.038) (0.037) 
4 Z -0.000 -0.001 -0.002 -0.025 -0.011 -0.029 
(0.009) (0.007) (0.007) (0.056) (0.042) (0.046) 
3 0.016 0.001 0.006 -0.048 0.029 0.007 
(0.015) (0.012) (0.011) (0.076) (0.058) (0.060) 
1 0.015* 0.010* 0.010* -0.033 -0.032 -0.039 
(0.006) (0.005) (0.004) (0.037) (0.027) (0.027) 
5 2 0.005 0.002 0.003 0.010 -0.014 -0.019 
(0.008) (0.006) (0.006) (0.040) (0.030) (0.028) 
2) -0.000 -0.004 0.000 0.014 0.034 0.019 
(0.011) (0.008) (0.008) (0.049) (0.036) (0.036) 
Controls Y ny Y Y Y Y 
Bandwidth 0.5 1 Varies 0.5 1 Varies 
AM a) 2™4 order 2™ order Cor 2™4 order 2% order = CCT Local 
Specification : ; Local : : ; 

polynomial polynomial eet polynomial polynomial Linear 


Notes: Entries are IV estimates of the effect of retention on a given outcome with standard errors in parentheses (adjusted for clustering at the running variable 
level in the parametric models). Results in columns 1, 2, 4 and 5 are 2SLS estimates from a parametric model that uses a quadratic function of the running 
variable, where the function is allowed to have different coefficients on either side of the Level 2 cutoff. Results in columns 3 and 6 are IV estimates generated 
by the local linear regression estimator proposed by Calonico et al. (2014). See Appendix Table | for outcome means by grade. 

* Significant ata =0.05; ** Significant ata=0.01; ' Significant after a False Discovery Rate correction, a = 0.05 
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Table 7b: Effects of Grade Retention on Attendance, by Grade (Grades 6 — 8) 


Grnde Years since 
summer test Attendance Rate Chronic Absences 

1 0.006 0.006 0.009 -0.053 -0.029 -0.052 

(0.007) (0.005) (0.007) (0.035) (0.027) (0.035) 

6 2 -0.012 -0.003 -0.009 -0.072 -0.056 -0.069 
(0.009) (0.007) (0.010) (0.038) (0.030) (0.037) 

3 -0.019 -0.004 -0.009 0.022 0.002 0.013 

(0.017) (0.013) (0.015) (0.049) (0.039) (0.046) 

1 0.011 0.017* 0.013 -0.018 -0.048 -0.043 

(0.009) (0.007) (0.007) (0.039) (0.030) (0.032) 

7 2 0.047** 0.039**4 0.046** -0.030 -0.026 -0.026 
(0.015) (0.011) (0.013) (0.043) (0.033) (0.031) 

2 0.032 0.056** 0.046* -0.042 -0.069 -0.063 

(0.025) (0.019) (0.020) (0.056) (0.042) (0.043) 
1 -0.033 -0.019 -0.036* 0.146** 0.126**f 021307" 

(0.020) (0.015) (0.017) (0.056) (0.042) (0.045) 

g 2 0.033 0.021 0.025 0.089 0.054 0.084 
(0.029) (0.022) (0.024) (0.061) (0.046) (0.054) 

4 0.017 -0.008 0.008 0.025 0.021 0.024 

(0.037) (0.028) (0.031) (0.066) (0.051) (0.055) 

Controls Y Y Y Y Y: y 
Bandwidth 0.5 1 Varies 0.5 1 Varies 
ee 2™4 order 2™ order CCTLocal 2"4 order 2™ order = CCT Local 
Specification : : : : : : 

polynomial polynomial Linear polynomial polynomial Linear 


Notes: Entries are IV estimates of the effect of retention on a given outcome with standard errors in parentheses (adjusted for clustering at the running variable 
level in the parametric models). Results in columns 1, 2, 4 and 5 are 2SLS estimates from a parametric model that uses a quadratic function of the running 
variable, where the function is allowed to have different coefficients on either side of the Level 2 cutoff. Results in columns 3 and 6 are IV estimates generated 
by the local linear regression estimator proposed by Calonico et al. (2014). See Appendix Table | for outcome means by grade. 


* Significant at a = 0.05; 


** Significant at a = 0.01; 


f Significant after a False Discovery Rate correction, « = 0.05 
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Figure 1: Estimated Density of the Minimum Level 2-Scaled Summer Assessment Scores 


Density of summer test scores 


5 0 5 


Note: Estimated density obtained using the McCrary (2008) procedure. 
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Figure 2: Fraction Retained in Grade by Score on Summer Assessment 


-1 -.5 0 5 1 
Summer Assessment (0=Lev 2 cutoff) 


Note: Estimated for all grades and cohorts subject to the policy. Fitted line is estimated from a 2™ degree 
polynomial specification with bandwidth of 1. 
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Figure 3: Effects of Grade Retention on Suspensions, Pooled Over All Grades 


Any suspension Year 1 Days suspended Year 1 
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Note: Estimated for all grades and cohorts subject to the policy. Fitted line is estimated from a 2™ degree 
polynomial specification with bandwidth of 1. 
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Figure 4: Effects of Grade Retention on Attendance, Pooled Over All Grades 


Attendance rate Year 1 Chronic absence Year 1 
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Note: Estimated for all grades and cohorts subject to the policy. Fitted line is estimated from a 2™ degree 
polynomial specification with bandwidth of 1. 
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Appendix 
Table Al: Outcome Means by Grade and Year Post-Retention 
Days Attendance Chronic 
Grade Year Any Suspension Suspended Rate Absence 
1 0.031 0.250 0.916 0.317 
2 2 0.050 0.421 0.915 0.316 
3 0.079 0.845 0.910 0.335 
1 0.057 0.398 0.920 0.291 
4 Ms 0.080 0.759 0.915 0.309 
3 0.109 1.882 0.901 0.362 
1 0.082 0.987 0.905 0.358 
p 2 0.125 1.887 0.895 0.386 
3 0.153 2.674 0.875 0.444 
1 0.148 2.664 0.901 0.341 
6 2 0.147 2.949 0.884 0.402 
3 0.142 DAV 0.854 0.459 
1 0.178 3.422 0.865 0.470 
a 2 0.156 2.477 0.827 0.516 
3 0.147 1.823 0.780 0.575 
1 0.175 2521 0.801 0.560 
8 Z 0.157 2.182 752 0.603 
3 0.125 Ree 0.715 0.642 


